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Abstract 

Genetic feedback loops in cells break detailed balance and involve bimolecular reactions; 
hence exact solutions revealing the nature of the stochastic fluctuations in these loops are 
lacking. We here consider the master equation for a gene regulatory feedback loop: a gene 
produces protein which then binds to the promoter of the same gene and regulates its 
expression. The protein degrades in its free and bound forms. This network breaks detailed 
balance and involves a single bimolecular reaction step. We provide an exact solution of the 
steady-state master equation for arbitrary values of the parameters, and present simplified 
solutions for a number of special cases. The full parametric dependence of the analytical 
non-equilibrium steady-state probability distribution is verified by direct numerical solution 
of the master equations. For the case where the degradation rate of bound and free protein 
is the same, our solution is at variance with a previous claim of an exact solution (Hornos 
et al, Phys. Rev. E 72, 051907 (2005) and subsequent studies). We show explicitly that 
this is due to an unphysical formulation of the underlying master equation in those studies. 



1 



I. INTRODUCTION 



Biochemical reaction networks underpin the robustness of cells to both internal and ex- 
ternal perturbations. Feedback and non-linearities make network behaviour hard to under- 
stand, and thus mathematical modelling of the networks can provide useful insights. Copy 
numbers of gene products, such as proteins, are often relatively small [l, 2|, which argues for 
a careful evaluation of the role of stochasticity. The importance of stochasticity is without 
doubt in gene expression, given that there are only one or two copies of most genes per cell 
[3]]. Modelling of the stochastic dynamics of networks is typically more involved than sets 
of deterministic rate equations. Exact solutions have been obtained for reaction networks 
obeying __detailed balance 4j-|6| and for those composed of first-order (unimolecular) reac- 



tions 
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ll| . However these restrictions are not typical of biochemical processes inside living 



cells. Detailed balance conditions are characteristic of closed systems of reversible chemical 
reactions in thermal equilibrium conditions; they only hold for open systems in special cases 
[4]. Living cells are open biochemical systems which actively exchange matter with their 
surroundings and which possess non-equilibrium steady states, and hence it is clear that the 
principle of detailed balance will not generally hold for intracellular biochemical systems 12]. 
It is also a fact that most systems of interest involve a number of second-order (bimolecular) 
reactions such as substrate-enzyme interactions, and protein-DNA interactions. 

Here we focus on perhaps the simplest example of a biochemical reaction network which 
overtly breaks detailed balance and which involves both unimolecular and bimolecular re- 
action steps. We consider a genetic regulatory network with a feedback loop, namely one 
in which the product of a gene binds to the promoter of that same gene, and regulates 
its expression. Furthermore the free and bound protein are assumed to be degraded via 
proteolysis. Note that while bound protein degradation is not as well known or obvious as 
free protein degradation, there are mechanisms which could mediate it, e.g. the ubiquitin- 
proteasome pathway can target and degrade parts of protein complexes [I3I . f^ . Similar 
feedback mechanisms as discussed above are ubiquitous in biolo gy, appearing in such di- 
verse contexts as metabolism 15], signaling 16j, somitogenesis [l2j and circadian clocks 



181 ] . Naturally, given its simplicity, special cases of this model have already been the sub 



ject of a number of studies. Hornos et al [jj]] and subsequent follow up studies |20H22| have 
claimed an exact solution for the case where the rate of bound protein degradation is equal 



to the rate of free protein degradation. Qian et al 23] have studied the case where the 
bound protein degradation rate is zero and developed an approximate solution of the master 
equation in the limits of slow and rapid switching of the gene between the unbound and 
bound states. A few studies have also considered variations of this simple model by allowing 



inhibition or activation by protein dimers 24|, |25 |. 



In this paper we obtain an exact solution of the master equation for arbitrary free and 
bound protein degradation rates in non-equilibrium steady state conditions. For the case 
of equal bound and free protein degradation rates, our solution differs markedly from the 



exact solution claimed by Hornos et al 



191 ]; we explicitly show that the difference between 



the two solutions stems from the fact that the master equations studied in the latter work 
have no consistent physical interpretation and hence constitute an incorrect description of 
the biochemical processes at play. 

In the next section we define the model and write down a master equation formulation 
of the stochastic dynamics. In section III we present the exact solution using the generating 
function method. In section IV we study three special cases in which detailed balance does 
not hold and contrast with the case in which detailed balance holds. In section V we directly 
compare our exact solution with numerical solutions of the master equations and show their 
correctness. In section VI we present a careful comparison of our exact solution to previous 
studies. We summarise our results and conclude in section VII. 



II. THE MODEL AND THE MASTER EQUATION 

We consider a single gene and its accompanying promoter region. Self-regulation means 
that the protein corresponding to the gene can bind to the promoter region and thereby affect 
the transcription and translation processes. Following previous work, we do not explicitly 
consider the transcription process, and the intermediate mRNA. Rather we model the pro- 
cess by following only the number of free proteins and the state of the promoter, namely, 
whether it is bound or unbound. It is important for us to carefully define the processes in 
order to make completely transparent how these will be encoded into a master equation. 
By 'free proteins' we mean proteins that have been created from transcription/translation 
of the gene in question, and which are neither bound to the promoter nor degraded. We 
only allow one protein to be bound to the promoter region at any given time. We do not 
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consider dimerization of free proteins. 



We define two conditional probabilities: Po(n,t)dt is the probability that in the time 
interval (t,t + St) there are n free proteins and the promoter is unbound, and Pi(n,t)dt is 
the probability that in the interval (t, t + St) there are n free proteins and the promoter is 
bound. In the latter case, there are in fact n + 1 proteins in the system, n of which are 
free, and one of which is bound to the promoter. The transcription process will be altered 
if the promoter is bound, and so the rate of production of proteins will depend on the state 
of the promoter region. This dependence of production rate on the promoter state breaks 
detailed balance and makes analytic solution of this problem non-trivial. We define r u and 
r& to be the production rates of protein given than the promoter region is unbound or bound 
respectively. We define kf to be the degradation rate of free proteins. We define k b to be the 
degradation rate of the bound protein. Allowing k b to be non-zero breaks detailed balance 
even if r u = r b and, again, makes analytic solution non-trivial, but not impossible as we 
shall see. Lastly, we define s b to be the binding rate per protein to the promoter region, 
and s u to be unbinding rate from the promoter region. Note, we assume that if the bound 
protein is degraded it is removed from the system: explicitly the total number of proteins will 
decrease by one, the total number of free proteins will remain unchanged, and the state of 
the promoter will change from bound to unbound. These processes and their accompanying 
rates are schematically illustrated in the following reaction scheme, where D u , D b , and P 
represent the unbound DNA, the bound DNA, and the free proteins, respectively: 



D U ^>D U + P, D h ^D b + P 



D b ^D u , D U + P^D 6 . 



Assuming that each process is an independent Poisson process allows us to encode the 



dynamics using master equations 



26( . The master equations for Pq and Pi have the following 



forms 



-P (n,t)=r u (P (n-l,t)-P (n,t)) 

+ kf {{n + l)P {n + 1, t) - nP (n, t)) 

+ k h P x {n,t) + SuP^n - l,t) - s b nP (n,t) , (2) 

-P x {n,t)=r h (P 1 {n-l,t)-P 1 {n,t)) 
at 

+ kf {{n + l)Pi(n + 1, t) - nPi(n, t)) 

- fc 6 Pi(n,t) -s u Pi(n,t) + s 6 (n + l)P (n + l,t) . (3) 

We have formatted these equations to make as clear as possible the relation of the terms to 
the corresponding molecular processes. Each line of the above equations corresponds to the 
processes described on the corresponding line of reaction scheme (111). The first line of each 
equation refers to processes originating from the gene, i.e., production of the protein. The 
second line of each equation refers to processes in the cytosol of the cell, i.e., degradation of 
free proteins. The third line of each equation refers to processes occurring on the promoter, 
i.e., degradation of the bound protein, and binding and unbinding of individual proteins on 
the promoter. 

There is no need to write special boundary conditions for these equations, so long as we 
impose P (n,t) = and Pi(n, t) = for n < 0. Explicitly, if we insert n = into the above 
equations we find 

j t P (0,t) = -r u P (n,t) + k f P (l,t) + hP^t) , (4) 
j t Pi(0,t) = -r 6 Px(n,t) + fc/Pi(l,t) - k b Pi(0,t) - s„Pi(0,t) + s b P (l,t) , (5) 

which correctly describe the time evolution of Po(0, t) and Pi(0, t). 

In the next section we present the exact solution of these equations in the steady-state 
using the generating function method. 

III. EXACT SOLUTION 

It is convenient to work with a dimensionless time variable, and so we choose to scale 
time by the rate of degradation of free proteins kf, i.e., r = kft. We define the dimensionless 
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rates 

Pu = r u /k f , p b = r b /k f , 9 = k b /k f , a u = s u /k f , a b = s b /k f . (6) 
For future reference we define the convenient parameters 

S b = 1 + a b , R = p u - p 6 S b . (7) 
In dimensionless variables, the master equations ($Z§ and (EJ) take the form 
-^-P (n, t) = p u (P (n - 1, r) - P (n, r)) 

GST 

+ ((n + l)P (n + 1, r) - n? (n, r)) 

+ OP^n, t) + a u Pi(r2 - 1, r) - a 6 nP (n, r) , (8) 

■j-Pi(n,T) = p b (Pi (n-l,r)-P 1 (n, r) ) 
(It 

+ ((n + l)Pi(n + 1, r) - nPi(n, r)) 

- 0P!(n, r) - ^(n, r) + ^(n + l)P (n + 1, r) . (9) 



23, 



28|. We define 



We will solve these equations using the generating function method 
the generating functions via 

oo 

G (2) = 5>"P (n) , (10) 

n=0 

oo 

G 1 (z) = J2 znp ^- ( n ) 

n=0 

Henceforth, we will work in the steady-state, and set dP (n)/dT = and dP\(n)/dr = 0. 
Summing the master equations © and Q over n with a weight of z n , one finds the coupled 
pair of first-order differential equations 

Pu (z - 1)G -{z- 1)G' + (6 + Ouz)^ - a b zG' = , (12) 
Pb (z - l)G l -(z- 1)G[ -(6 + a u )G l + a b G' = . (13) 

The obvious way to proceed is to write G\ in terms of Go and G' Q in Eq. (|12p and then 
substitute into Eq. ( TT~3T) . However, this leads to a second-order differential equation for 
Go which is not of the Riemann type, and so cannot be solved in terms of hypergeometric 
functions. 
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The less obvious way to proceed is as follows. We differentiate Eq. (TP2]) to get an equation 
involving Go, G' Q , Gq, G\ and G[, and then use (I12p again to eliminate Go i n favour of G' 
and G\. This leads us to an equation involving G' Q , Gq, G\ and G[. Finally by means of 
Eq. ( TT3"j) . G' Q and Gq can be expressed in terms of G± and its derivatives. This leads us to a 
second-order differential equation for G\ which reads 

A{z)G'[ + B(z)G\ + C(z)G 1 = , (14) 

with 

A(z) = l-E b z, (15) 

B{z) = ( Pu + p b T, b )z - ((1 + 6)Y> b + a u + p u + p b ) , (16) 

C(z) = p u (9 + a u + p b ) + p b Y, b - p u p b z . (17) 

Since Eq. (I14p has linear coefficients in z it can be transformed to the differential equa- 
tion for the confluent hypergeometric function. On writing G\(z) = e az G\{bz + c), and 
substituting into Eq. Q14p one can determine a, b, and c, which provides the following 
solution 

G^z) = e^G^w) , (18) 

where 

w = • (19) 



J b 



and G\{w) satisfies Kummer's equation 29| (i.e., the confluent hypergeometric differential 
equation) 

wG" + (P- w)G[ - aGi = , (20) 

with 

a = e+ (Ju{pu - pb) , (21) 
R 



f3 = l + 6 + ^r(<j u + p u -^) . 



and 

8 = 1 + v 

Eq. ( 1201) admits two independent solutions, the Kummer function M(a, (3, w) and the Tri- 
comi function U(a, (3, w). The latter is inadmissible as a solution for Gi, as we require that 
Pi (n) — > for n — > oo and that the sum over n of Pi (n) is finite. Thus, we have the exact 
solution of the generating function in the form 

Gi(z) = Ae pbZ M(a, 13, w) , (23) 



where A is a normalization constant. Referring to Eq. ( Tl"3]) we see that knowledge of G\(z) 
enables us to find an exact expression for dGo(z)/dz. Substituting Eq. fl23|) into Eq. (fT3]) 
and using the transformation properties of the Kummer function, we find 



dG (z) 
dz 



Ae p 



i, -- 



\M(a + l,^)-^M( a ,^)~>(a + U+l,i«) 
, - L) K L b p 



• (24) 



It is difficult to extract an explicit solution for Gq(z) by integrating this expression, and 
this is consistent with the fact that the second-order differential equation for Go is not of 
the Riemann form. However, as we shall see, this is not an impediment to finding explicit 
expressions for Po(n). 

We now proceed to obtain the probability distributions Po(n) and P\{n). The probability 
distribution P\ (n) can be retrieved from the generating function via 



(25) 



z=0 



Substituting Eq. (T2"3j) in the above equation leads us to 

\ ^ syn „n—m I \ \ a ) 



m=0 V D/ 



00). 



M[a + m, (3 + m, w ) 



(26) 



where we have defined Wq = w(0) = — i?/S^, the Pochhammer symbol (a) n = T(a + 
n)/r(a) = a (a + 1) . . . (a + n — 1), with (a)o = 1, and the combinatorial symbol C™ = 
n!/m!(n — m)\. 

Using the analogous expression to Eq. ([25]) for G and Pq, we can obtain Po{n) for n > 1 
by differentiating Eq. (1241) . with respect to z, (n — 1) times, dividing by n\ and setting 
z = 0. With some use of the transformation equations for the Kummer functions we find 
the compact expression 

n-l 



m=0 v 7 



i?\ m (a) 



G8)r 



(m + a) M(a + m + 1, /3 + m, w ) 



m + a + 



M(a + m, (3 + m, w ) 



.(27) 



for n > 1. This just leaves -Po(O), which can be found directly from the master equation (JSJ) 
on setting n = in the steady-state: 



p u P o (0) = P o (l) + ^i(0) 



(28) 
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On substituting the exact forms for Pi(0) and Pq(1) from Eqs. ( )26|) and (1271) respectively, 
we find 



P (0) = A 



S b a . , , ^ „ \ £7,, 



-Af (a + 1, /3, w ) - -^M (a, 0, w ) 



(29) 



:s 6 -i) Pu v u/ p 

Note, this last expression requires p u > for -Po(O) to be finite. In fact, this condition is 
required for the existence of a non-empty steady-state. If p u = then there is an absorbing 
state of zero proteins, and the steady-state in that case will correspond to an empty system. 

The evaluation of A requires us to normalise by calculating the sum over n of Po(n) and 
Pi(n) 

oo 

£(P„(n) + Pi(n)) = l. (30) 

n=0 

The sum over P\{n) is straightforward, since it is just G\{1). However, the sum over Po(n) 
cannot be reduced to a simple form as we do not have an explicit expression for Gq(z). One 
can perform the sum over n of Po(n) using the explicit expression (l2"7j) . but this cannot be 
reduced beyond definite integrals over Kummer functions whose form is apparently unknown. 
As such, the normalisation constant A is most easily found by numerically computing the 
sums over n of the explicit expressions for Po(n) and Pi(n). The probability that there are 
n proteins in steady-state conditions, P(n), is then given by the sum of Po(n) and P\{n). 
Ratios of moments such as the Fano factor, i.e. the variance of fluctuations divided by the 
mean number of molecules, do not depend on A and hence explicit expressions can always 
be written down for such quantities. 

We finish this section by noting that the exact expressions, Eqs. (1261) . ( |27l) and (1291) . 
are valid for R ^ 0. The singular case R = requires special attention and is treated in 
Appendix A. 



IV. SPECIAL CASES 

In this section, we provide results for four special cases which can be grouped into two 
classes: (i) the case of detailed balance (p u = pb and 9 = 0), in which Poisson statistics are 
expected to hold, (ii) three cases in which detailed balance does not hold: pb = 0, a = and 
a = p. In each of these cases one can write down simple expressions for the normalisation 
constant of the probability distribution and hence one can also obtain explicit expressions 
for all the moments of the distribution. In what follows we calculate the dependence of the 
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fraction of time for which the promoter is bound on the average number of free proteins. 
We denote the former quantity by f g, which is defined explicitly by 

oo 

/off = J]Pi(n), (31) 

n=0 

and denote the average number of free proteins by (n), which is defined explictly by 

oo 

(n} = ^ n(P Q (n) + P 1 (n)) . (32) 

n=0 

We will find that the functional dependence f s(( n )) under non-detailed balance conditions 
can differ markedly from the detailed balance case. 



A. Detailed balance conditions 



The self-regulating gene, described purely in terms of the states representing the number 
of free proteins, does not generally satisfy detailed balance. This has two causes: (i) the 
degradation of bound protein, and (ii) the differing rates at which proteins are produced 
depending on the state of the promoter. Thus, detailed balance can be restored by taking 
9 = and p u = pb (see for example 30] for a general discussion of detailed balance in a gene 
regulation context). In this case, production of protein is independent of the state of the 
promoter, and degradation occurs only in the free protein pool. As such, detailed balance 
holds, and, indeed, the distribution of free protein is trivially Poisson. A well known, yet 
non-trivial, observation is the simple relationship between / g and (n) in this case, which 



satisfies the Hill function 



3l| 



^ - • ( 33 ) 

This provides a test of our exact solution which we now confirm. We equate the scaled 
production rates to each other, defining in the process p = p u = pb- We first note from Eq. 
(l2Tj) that when p u = pb and 6 = 0, then a = 0. The Kummer function M(a = 0, 0, w) = 1, 
and so, from Eq. fl23|) we have 

G 1 (z) = Ae pz . (34) 
It is straightforward to integrate Eq. (124)) for G in this case, and one finds 

G (z) = A—e pz + B , (35) 

PCTb 
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where B is an integration constant. This constant can be fixed by imposing the condition 
(128]) and one finds B = 0. Normalisation fixes A, and one can then retrieve the explicit 
solutions for the probability distributions: 



I _|_ ESJl \ n - 



(T u 



e~ p o n 

= ^~^ n\ ■ (37) 



po-b 

Note, the distribution of free proteins, Po(n) + P\{n) = e~ p p n /n\: a Poisson distribution as 
anticipated. Summing these distributions over n with a weight of n, the average number of 
proteins is easily found to be (n) = p. Now the fraction of time for which the promoter is 
bound, /off, is equal to the probability that the gene is 'off', which is obtained by summing 
Eq. f )37|) over n. Expressing the latter in terms of (n) and reverting to unsealed rate 
parameters we obtain Eq. ( )33|) . as required. 

We finish this section by noting that the deterministic model of the genetic feedback 
loop predicts that f g has a Hill function dependence on (n) for all parameter values (see 
Appendix B). Hence for the detailed balance case, the predictions of the stochastic and 
deterministic models are in agreement. 

B. Non-detailed balance conditions 

1. The case Pb = 

This is the case of strong transcriptional repression since the production of the gene in 
the bound state is zero. In this case, we see from Eq. ( 124)) that the expression for dGo(z)/dz 
is a sum of Kummer functions, which can be directly integrated. On doing so, and, again, 
utilising the transformation formulae for Kummer functions, one finds the compact result 



G n (z) 



p u (S fe - 1) 



+ a u )E b M(a + 1, p, w) - a u (J: b - l)M(a, (3, w) 



(3* 



In principle, an unknown constant B should be added to this expression; however, one can 
fix this constant by utilising the relation (128 p and one finds that B = 0. Directly from Eq. 
(|23|) . we have 

Gx(z)=AM(a,P,w), (39) 
11 



where we have the simpler expressions a = 9 + a u and w(z) = p u (T, b z — l)/£jj; the parameter 
f3 is still given by Eq. ([22]) . 

It is now straightforward to determine the normalisation constant A by imposing condi- 
tion (1301) . which is equivalent to Gq(1) + Gi(1) = 1. One finds 



where 



,4 



A = Pu (E b - l)A 



+ a u )E b M(a + 1, 0, wt) + { Pu - ct u )(E 6 - l)M(a, (3, Wl ) 



(40) 



(41) 



b- 



and wi = w(l) = p u (£& - l)/£ 

The explicit forms for the probability distributions can be obtained directly from the 
general formulae (|26|) . (|27|) . and (|29l) . or from direct evaluation from the explicit generating 
functions ( |38l) and ( |39l) . In either case, one obtains 



nl 



(Pu" 


\ n 0)n r 







'a + n)S fe M(a + n + 1, fi + n, w ) — a u (T* b — l)M(a + n, (3 + n, w ) 

(42) 



and 



Pi{n) = -r 



^4 ( Pu\ n (a) n 



nl 



M(a + n, (3 + n, wq) 



(43) 



where in this case wo = w(0) = —p u /T^. Note that Eq. (112"]) for Po( ri ) is valid for all n > 0. 

The fraction of time for which the promoter is bound, / g, is given by G\(l), and has the 
form 

/ off = p u (E b - l)AM(a, (3, w x ) . (44) 

The mean number of free proteins, regardless of the state of the gene is given by G' (l) + 
G[(l) and has the form 



(n) = p u aA E b M(a + 1, (3, wt) - (E b - l)M(a, (3, 



(45) 



Unlike the detailed balance case, it does not seem generally possible to write f a g as a 
function of (n). The behavior for small and large (n) can however be easily deduced. It 
is clear that in the limit of large (n), f a g approaches one. The small (n) behavior can be 
inferred by a series expansion of Eqs. (jUJ) and (J15|) in powers of p u 

<?bPu 



off 



9T, b + a u 



+ O(pl) 



0h h + a„ 



(46) 
(47) 
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From these expressions, one can deduce that 

The intermediate (n) behavior is however unknown. The dependence of / Q g with (n) is most 
easily explored by numerically evaluating Eqs. (I44j) and ( 1431) for various values of p n . In 
Fig. 1 we show two plots generated in this manner, one for small and the other for 
large, both with 6 = 0. In each case we compare with the Hill function 



(n) 
(n) + 



/off = TTT^W (49) 



Note that this function's small and large (n) dependence are the same as those of the 
exact solution. Furthermore this function is the prediction of the deterministic model (see 
Appendix B). From Fig. 1, we see that the Hill function is a good approximation to the actual 
function for small 07,; in the opposite limit of large a^, the two functions are considerably 
different for intermediate (n). In this case the exact solution shows a piecewise linear form, 
with the linear dependence of / fj approximately holding until the function 'breaks' at its 
threshold value of unity. 

As a last comment in this subsection, defining by (n)o the average of n conditioned on 
the promoter being unbound, i.e. 



oc 



^2 n Po(n) 

(n)o = , (50) 

£ Po(n) 



n=0 



we find the simple form 



p u A(6 + a u )M(a, f3, wi) > . 

(n> ° = (T^M ■ (51) 



Comparing this with Eq. (l44p we have 



which can be inverted to obtain the curious result 

= < B)o + fi ? i ■ (53) 

Thus, an equation resembling the Hill function is found, but the average of n is replaced by 
the average of n conditioned on the promoter being unbound (i.e. the gene being switched 
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on). In fact, one can show that this relationship holds quite generally (i.e. for arbitrary 
values of p&) by summing the master equation (JS]) for Po(n) over n and using the definitions 
of / off and (n) . 



2. The case a = 

We saw in the subsection on detailed balance conditions that when a = the Kummer 
function is a constant, and G\ then simplifies to an exponential function, giving Poisson 
statistics for Pi(n) (and also for Po(n) in that case). The cases considered in this subsection 
and the next are two further cases in which the exact solution for G\ reduces to a pure 
exponential function. Note, we have previously defined R = p u — pb^b- We will not be using 
R for this case and the next one, as it is helpful to see the role of p u and pb explicitly. 

In setting a = 0, but without imposing the two conditions for detailed balance, we have 

G x {z) = Ae pbZ , (54) 

and Poisson statistics for P\{n). The condition a = can be written 

9 = ^ = Pb \ . (55) 
{p b E b - p u ) 

To hold, this relation requires a quite severe constraint on the range of p u , namely p b < 

p u < pb^b- Given the form of G\ in Eq. (13411 . one can use Eq. ( 1241) to retrieve the simple 

exponential form for Go and the normalisation condition and Eq. ( 1281) to fix the two arbitrary 
constants that arise. After some algebra one finds 

e~ Pb Ou 

P ( n ) = 1 ( 56 ) 

Pi(n) = ^ • (57) 



(Pb^b-Pu) t 

The mean number of free proteins has the particularly simple form (n) = Pb = fb/kf. 

In this and the next subsection the special cases impose non-trivial relationships between 
parameters (see Eqs 055p and 05 9p ). As such, these solutions are valid on hypersurfaces in 
parameter space. Because of this non-trivial relationship between parameters, one cannot 
change (n) through variation of a single parameter, and as such the functional relationship 
fos((n)) is of limited experimental interest, and thus we do not report such results here. 
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3. The case a = j3 



In setting a = (3 we can take advantage of the fact that M(a,a,w) 
using Eqs. ( TT5|) and f l2"3"j) we have 

Gi(z) = A'e p " z/E6 , 



|29|. Thus, 



(58) 



and, consequently, Poisson statistics for P\(ri). Using Eqs. f|2Tj) and fl22|) . the condition 
a = (3 translates to fixing <r u in terms of p u , Pb, and £&, independent of 9, as follows 

(p u - p b T, b )(El + p u (£ fe - 1)) 



<7„ 



(59) 



p u E b (£ 6 - 1) 

Note, this condition requires p u > Pb^b, and so this condition has no overlap with the 
condition a = studied above. The implicit reason for this stems from the definitions of a 
and f3: while the former can take a value of zero, the latter is always greater than 1. 

One can now substitute the expression for G\(z) into Eq. (I24p . and find the explicit form 
for Gq(z) 



G (z) 



A' 



Pu - Pb^b \ z + ^± (q (Pu ~ Pb^b] 
Pu J Pu\ PuiX'b - 1) 



(60) 



Note, the arbitrary constant which arises from integrating Eq. (1241) is found to be zero on 
application of the condition f l28|) . Given the z dependence of the prefactor to the exponential, 
Po(n) will not have a purely Poisson form. 

After some algebra, one finds the explicit forms for the probability distributions: 



Pain) 



A' 



P 1 (n) = A 
where 



(£ 6 " 1) 
/ (Pu/Efc 



Pu ~ Pb^b\ 



Pu 



m-1 



(n -I ! 



Pu 



[Pu - Pb^b) \ [pu/^bT 

p u (E 6 -l)y n\ 



IV. 



A' 



P^(S b -l) 2 e 



(61) 
(62) 

(63) 



Sfe \{p u ~ Pb^b) + Pu{^b - 1)(0 + Pu~ Pb)] 

Note, Eq. flBTj) holds for n = with the understanding that 1/ (—1)! = 0. 
Using these distributions, we find 

/ v _ ^ (p„ - p&E&)E& + p M (S fe - 1)(6> + p M - p fe ) 

U [ (Pu~ Pb^b) + Pu(^b - 1)(0 + Pu ~ Pb) 

Note that (n) > p u , and hence (n) > p u /^b which is the parameter in the Poisson-like 
distributions. 
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V. NUMERICAL VALIDATION OF THE EXACT SOLUTION 



In this section we numerically solve the master equations, Eqs. (jH!)-© an d compare 
with the exact solutions obtained in Section III. 2N difference equations are generated by 
substituting n = 0, 1, 2, N — 1 in Eqs. ([8])-(j9]) with the time derivative set to zero. The 
boundary conditions are set to Po(— 1) = Pi(— 1) = Pq(N) = Pi{N) = 0, taking into account 
also the absence of probability flux into Pq(N) and P\(N). This set of difference equations 
is solved simultaneously for P (0), Pi(0), P (l), Pi(l), P {N - l),Pi(iV - 1). Note that 
the exact solution corresponds to N equals positive infinity. Of course practically we are 
only interested in obtaining the probability distribution solution to some desired accuracy 
and hence it is sufficient to solve the equations for a large enough positive integer N. This 
should be chosen large enough such that the probability distribution solution Po(n) +Pi(n) 
smoothly decays to zero as n approaches N. 

We use the latter method with N = 500 to obtain the dependence of the steady-state 
probability distribution solution of the master equations, Eqs. (EJ)- (JHI) , on the five non- 
dimensional parameters 6, <Jb, &u, pb and p u . The same is obtained by means of the analytical 
solutions given by Eqs. (1261) . (j27|) and (1291) . The results from the two methods are compared 
in Fig. 2 where the open circles show the numerics and the crosses show the analytical 
solution. Note that in all cases, P(n) goes to zero for n much less than 500 and hence 
artificial boundary effects due to finite N should be negligible; indeed we verified that the 
probability distribution solution obtained with N = 1000 is indistinguishable from the one 
obtained with N = 500. Note that the numerics and the analytical solution are in perfect 
agreement which indeed verifies the correctness of the main result of this paper, namely the 
exact solution given by Eqs. fl26|) . (|27D and (1291). 

It is interesting that in (a), (b), as we gradually increase 9 and er& respectively, we observe 
a transition from a bimodal to a unimodal probability distribution while in (c) we see the 
reverse transition as the parameter p u is increased. The bimodal character of the distribution 
is particularly interesting since the deterministic model of the genetic feedback loop does not 
exhibit bistability (see Appendix B). The mechanism behind the origin of bimodality and the 
transition from bimodal to unimodal behavior can be inferred as follows. In cases (a)-(c), the 
peaks of the bimodal distribution occur at n ~ p u and at n ~ p& while the single peak of the 
unimodal distribution occurs at one of these two (depending on parameter values). Now the 

1(3 



processes D u -A- D n + P, and P — > considered by themselves lead to a Poisson distribution 
for the number of protein molecules with peak at n ~ p u in steady-state conditions; similarly 
a peak at n ~ p b can be associated with the processes D b ^D b + P, and P -A 0. Hence 
it is clear that bimodality occurs whenever the gene switches slowly between its bound and 
unbound states which leads to a switch between the two sets of reactions discussed above. 
Inspection of the reaction mechanism Eq. (JT|) shows that the switching rates increase with 
9, a b and o u . Hence if for some parameter set we have bimodality, increasing any one of the 
aforementioned three parameters will lead to a switch from bimodal to unimodal behavior; 
these are cases (a) and (b). If we have unimodality then the distribution will of course stay 
unimodal upon variation of one of the three parameters; this is case (d). Slow transitions 
between unbound and bound states are necessary but not sufficient to induce bimodality; 
the protein production rates of the two gene states must be sufficiently different such that 
the peaks of the Poisson distributions associated with each state are well separated; this is 
case (c). Cases (a)-(d) are ones in which the genetic feedback loop is negative, i.e. p b < p u . 
In contrast cases (e) and (f ) are for a positive feedback loop, i.e. p b > p u - As for the negative 
feedback case, both unimodal (case (e)) and bimodal behaviors (case (f)) are possible and 
their existence can be understood by the same switching mechanism elucidated above. For 
example the case 9 = in (f) is bimodal with peaks at n ~ p u and n ~ p b indicating slow 
switching between the steady-states of the bound and unbound genes. Increasing 9 leads 
to an increase in the switching rate from the bound to the unbound states explaining the 
increase in the size of the peak associated with the unbound state and the corresponding 
decrease of the peak size associated with the bound state. 



VI. CRITIQUE OF A PREVIOUS "EXACT" SOLUTION 

In this section we make a careful and explicit comparison of our master equations with 



those of Hornos et al |19| in the original paper claiming an exact solution to the problem of 
a self-regulating gene with the condition kf = k b (i.e. 9 = 1). In that work, the variable n 
of the probability distributions represents the total number of proteins in the system, that 
is the number of free proteins plus (when the promoter is bound) the bound protein. 

The following key provides the translation between the notation we have used in section 
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II and the notation of Hornos et al. 



P {n,t) <- 


-> a n (t) 


(65) 


Pi(n,t) ±- 


-> Pn+l{t) 


(66) 


r u <- 


->■ 0a 


(67) 


n <~ 


-> 08 


(68) 


kf <- 


-)• fc/ 


(69) 


h <- 


-> h 


(70) 


Su <~ 




(71) 


S b <- 




(72) 



Note, in their original work, Hornos et al assume from the outset that the rates of degradation 
of free and bound protein are the same, and use the symbol k for this degradation rate. Here, 
for clarity in the handling of these two different processes, we allow the rates to be different, 
and use the symbols kf and kb consistent with the notation used in section II, setting them 
equal eventually. 

With the key given above, the correct master equations (j2J) and ([3]) take the form 

^Oi n (t) = g a (a n -x - a n ) 

+ k f ((n + l)a n+ i - na n ) 

+ k b /3 n+ i + f/3 n - hna n , (73) 



d 
dt 



fin = g/3 (Pn-1 ~ fin) 

+ kf (n(3 n+1 - (n - l)(3 n ) 
- k b fi n - ffi n + hna n . 



(74) 



The first equation is valid for n > while the second equation is valid for n > 1. The 
boundary conditions Po(— l,t) = and P\{— l,t) = imply the new boundary conditions 
a-i =Po = 0. 

On setting the free and bound degradation rates to be equal kf = fc& = k, the master 
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equations above take the form 

+ k((n + l)a n+1 - na n ) 

+ k(3 n+1 + fp n - hna n , (75) 

^/^n = 9(3 {Pn-1 ~ Pn) 

+ k(n(3 n+1 -(n-l)p n ) 

- k(3 n - f(3 n + hna n . (76) 

We are now in a position to directly compare these master equations with those of Hornos 
et al. Equations (1) and (2) of that paper read (for n > 0) 

4i a n{t) = 9a («n-l ~ «n) 

at 

+ k((n + l)a n+ x - na n ) 

+ f/3 n - hna n , (77) 

^/^n = 9/3 (Ai-l ~ fin) 

+ k {n(3 n+l - (n - l)/3 n ) 

+ k {P n+ i - p n ) - fp n + hna n . (78) 

Obviously, in the second and third lines of Eq. (|78|) one can combine the terms with a 
prefactor of k to give k((n + l)/3 n +i — n(3 n ), but we have separated these terms to emphasise 
the comparison to the correct equation (I76p in which, as stressed in section II, each line of 
the equation corresponds to processes occurring on the gene, the cytosol, and the promoter 
respectively. In order to conserve probability, Hornos et al are obliged to write a separate 
equation for n = which reads 

-jj_a (t) = -g a a + fc(ai + Pi) , (79) 

which is consistent with Eq. (1751) for n = and using the boundary condition a_i = 0. 
They also enforce the boundary condition p = 0. The same master equations, Eqs. (ITT]) 



and ( JTB]) . appear also in several follow up papers [20l-|22] 
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Let us denote the master equations (1751) and (176|) . as ME(GSN), and the master equations 
flZTJ and (JTSD as ME(H-ET-AL). Clearly ME(GSN) and ME(H-ET-AL) are different, and 
given they purport to describe the same process, they cannot both be correct. The difference 
between the two sets of equations is in how degradation of the bound protein is described. In 
our master equations ME(GSN), which were derived from Eqs. (T2J) and ([3]) by use of the key, 
if the bound protein is degraded, the system returns to the unbound state, i.e., that process 
connects the two conditional probabilities a n and n +i- in ME(H-ET-AL), degradation of 
the bound protein keeps the system in the bound state. How is this possible? Literally, the 
ME(H-ET-AL) master equations are describing bound protein degradation as the following 
composite of processes: degradation of the bound protein, with instantaneous rebinding of 
a protein to the promoter from the free protein pool, such that the state fi n +i transitions to 
the state (3 n . This composite of processes occurs with a constant rate k, independent of the 
number of free proteins. Clearly this composite of processes is not the intended instantiation 
of the self-regulating gene, and furthermore, it is difficult to imagine any physical process 
that could operate in this manner. One could concoct an imaginary process using a Maxwell 
Demon, a tiny animated figure, who sits by the promoter with a free protein always to hand, 
to instantaneously latch onto the promoter should the bound protein be degraded, but such 
constructions belong to philosophical discussions of irreversibility, and are not relevant to 
the biological question at hand. 

One might argue that the manner in which the bound protein degradation is handled has 
little bearing on the form of the distributions. To test this, we used the method expounded 
in Sec. V to numerically solve the two master equations, ME(GSN) and ME(H-ET-AL), in 
steady-state conditions for four different sets of parameter values. The results are shown in 
Fig. 3, where the open circles show the predictions of ME(GSN) while the solid lines show 
the prediction of ME(H-ET-AL). The parameters g a , gp, and k are, in all cases, equal to 80.0, 
0.0, and 1.0 respectively, while h = 0.001A and / = 0.1A where A takes the values 0.01 in 
Fig. 3(a), 1 in Fig. 3(b), 10 in Fig. 3(c) and 100 in Fig. 3(d). Note that as A is varied from 
0.01 to 100, the ME(H-ET-AL) predicts a transition from unimodal to bimodal probability 
distribution and back to unimodal distribution. However no such transitions are seen in 
ME(GSN). Note that the case A = 1 in Fig. 3(b) corresponds to the exact set of parameters 
used in the case u = 0.1 in Fig. 1 of the Hornos et al paper 19|. The good agreement 



between ME(GSN) and ME(H-ET-AL) for cases (a) and (d) can be explained as follows. In 
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case (a), the rate of protein binding to the promoter is very small, the gene is most of the 
time in the unbound state and hence bound protein degradation rarely occurs. In case (d), 
the gene switches between the unbound and unbound states very rapidly but its decay to the 
unbound state occurs primarily via the reaction — > D u + P and only occasionally via the 
reaction — >■ D u . In both cases, bound protein degradation is a rare event compared to the 
other molecular processes and hence it follows that the form of the probability distribution 
is practically insensitive to whether bound protein degradation is correctly or incorrectly 
described. In cases (b) and (c), bound protein degradation events occur at a frequency 
comparable with that of other molecular processes, and hence its correct description using 
the ME(GSN) becomes crucial to obtaining the correct steady-state probability distribution. 
The discussion on the role of 9, particularly when it approaches unity, explains why the 
bimodal distribution is not to be expected in a correct formulation of this problem. 

It is worth mentioning that a cursory comparison of our results with those of Hornos et 
al indicates some similarities, which are in some cases superficial. For example, Hornos et 
al find Kummer function solutions for the generating functions. Note, in the solution of 
ME(H-ET-AL), it is the generating function conditioned on the unbound promoter (Go in 
our notation) which is expressed in terms of a single Kummer function, while the generating 
function conditioned on the bound promoter (G\ in our notation) is expressed in terms of a 
sum of Kummer functions. In our exact solution of the correct master equation ME(GSN) 
we find that G\ is the generating function which has the simple form in terms of a single 
Kummer function, as shown in Eq. f[23|) . while Go involves integrals of Kummer functions, 
and does not have a simple closed form expression. We also note that Figure 2 of Hornos 
et al shows a plot of / ff versus (n), which indeed continuously varies from the Hill form 
to a linear piece- wise form as aj, is decreased, similar to our result shown in Figure 1. In 
the caption of Figure 2 of their paper, the curious relation between f Q g and (n) is noted, 
similar to our Eq. ( 15 3j) . although the combination (# + cr u )/<7b in our equation is replaced by 
o u j Oft in theirs, indicating again the incorrect handling of protein degradation in that work. 

Qian et al {23 1 studied the case where the bound protein does not degrade (i.e. 6 = 0). 
The coupled master equations, Eqs. 17(a)-17(b) in their paper, are the same as our master 
equations, Eqs. (1731) -(17^1). with k b = 0. Hence their master equations are correct for this 
special case. However they do not solve these equations exactly. Rather they derive an 
approximative solution in the limit that the gene switches between the D u and D b states 
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very rapidly and in the limit that the switching occurs very slowly. In the latter limit, i.e., 
the limit of small protein binding and unbinding rates to the promoter, they show that the 
probability distribution is bimodal if the production rates of the gene in the bound and 
unbound states are sufficiently different. This is confirmed by our exact solution, see the 
case 9 = in Fig. 2 (a). We find that this bimodal behaviour disappears when 9 is increased 
from to 1 (see Fig. 2 (b) and (c)). 



VII. CONCLUSIONS 



In this paper we have presented an exact solution for the simplest model of a self- 
regulating gene. Our solution is valid for an arbitrary degradation rate of the bound protein, 
which we have denoted throughout the paper by the parameter 9 in dimensionless units. The 
explicit solution for the probability distributions for the number of free proteins, conditioned 
on the gene's promoter being bound or unbound, are given by Eqs. f[2B|) and 02 7p respec- 
tively. It is interesting that an exact solution for this general case can be found given that 
the model breaks detailed balance and includes a bimolecular reaction step, features not 
typical of the exact solutions reported in the literature (4-9, [ll]. In particular, to the best 
of our knowledge, our exact solution is a first for a gene regulatory network with a feedback 
loop. As we have shown, the previous exact solution claimed by Hornos et al (l9| for the 
special case in which bound and free protein degradation are equal is incorrect because it 
is based on master equations which possess no coherent physical interpretation. The only 
other exact solution known in the context of stochastic gene expression is that derived by 
Shahrezaei and Swain for a model of gene expression involving first-order processes describ- 
ing transcription, translation, protein degradation and mRNA degradation but no feedback 
loop [11]. 

We anticipate that our exact solution will be useful to explore the dependence of com- 
mon ex per imental measures of noise intensity, e.g., the coefficient of variation 32| and Fano 
factors 33| , on various parameter values and on the nature of the feedback loop (repressing 



or activating). This may lead to insights into the mechanisms used by cells to regulate 
fluctuations in the protein concentrations, a topic of intense current research [jy, 35]. Other 
interesting avenues of research, which we have briefly touched upon in this paper, are the 
investigation of the transition from unimodal to bimodal protein distributions, and of de- 
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viations from the Hill function describing activation or repression of the gene. Our exact 
solution will enable a more thorough analysis of these topics which could previously only be 
investigated by means of approximation methods in some restricted parameter regimes. 

For chemical systems involving bimolecular reactions, the moment equations obtained 
from the master equation cannot generally be solved in closed form, and thus various approx- 
imations of the master equation have been developed. In particular for systems composed 
of unimolecular and bimolecular reactions, the time-evolution equation of the M th central 
moment of the probability density function solution of the master equation is generally a 
function of the (M + l) th central moment. This implies an infinite hierarchy of coupled equa- 



tions which cannot be generally solved 



36] , although see 37| for an exception. In the limit of 



intermediate or large numbers of molecules, various methods have been developed to obtain 



approximate expressions 



system-size expansion 
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or the moments (examples of two widely used methods are the 



38M42| and moment-closure approximations 36|, |43M46| ) . How- 



ever the reliability and accuracy of these methods when applied to systems characterized by 
low copy number of molecules and bimolecular reaction steps has remained an outstanding 
question of practical interest. Hence we anticipate that our exact solution will also provide 
a useful benchmark with which to compare the gamut of approximation methods used to 
estimate the effect of noise in biochemical systems. 
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Appendix A: The case R=0 

The exact solution presented in Eqs. (EES}, (}2"Tj) . ( I2"2"j) and ( 12"3"|) assumes that the parameter 
R = p u — pfeSb is different to zero. The case R = requires a separate analysis, either by 
taking the Kummer function solution ([23]) and taking a careful limit, or else by reexamining 
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the second-order differential equation (fl4{) explicitly for R = 0. For illustrative purposes, we 
demonstrate the second method here, and derive an exact expression for G\(z) and Pi(n). 
These calculations can be extended to derive the exact form for Po(n), if desired by the 
reader. 

Starting with Eq. fll4l) we set R = by writing p^ = p u /T,b. We make the double trans- 
formation 

Gi{z) =exp( Pu /£ 6 ) Gi{z) , (Al) 

and 

u=(z- 1/S,) 1 / 2 , (A2) 
to obtain the differential equation for G\(u) which has the form 



d 2 G 1 (27 + I) dG 1 
du 2 u du 

where 



- AGi = , (A3) 



Ou Puj^b ~ 1) 

and 



1 = + (A4) 



A = Ap '"' {1 i b - l) . (A5) 

I 1 

Eq. ( 1A3I) can be directly related to the differential equation for the Bessel function [29(, and 
we have 

G x {z) = Au" 7 / 7 (A 1/2 n) , (A6) 

where A is a normalisation constant and 7 7 is the modified Bessel function. Note, i_ 7 is 
an independent solution (for 7 not equal to an integer), but can be discarded, since its 
asymptotic properties lead to a non-normalisable probability distribution. 

In order to derive an explicit form for P\{n) we use Eq. (1251) . and the differentiation 



formula 



291 

ld_\ n 

v dv 



iTTJ» = v^- n I 1+n {v) . (A7) 

V Lb V / 

This provides the final result 

n , \ n-m / \\m / v n (m+ 7 )/2 

^») = ^rE^(^) (5) (t) ^ ■ < A8 > 



n ^ m VS b / V2/ VA 

m=0 V "/ V / \ 

where A' is a normalisation constant. The reason the Bessel function J appears in Pi rather 
than the modified Bessel function J, is due to the argument of Eq. ( 1A6|) becoming imaginary 
when z — 0. 



26 



Appendix B: Mean-field theory 



We consider a fictitious experiment in which the cell membranes of a large number of 
identical cells enclosed in some reaction volume Q are dissolved such that the genes and 
proteins from each individual cell can interact with that from every other cell. The well- 
mixed dynamics of this reaction system is deterministic (due to the large number of cells) and 
the state of the system at any point in time is described by two variables: the concentration 
of unbound DNA molecules, </>„, and of the protein, 0. Note that the concentration of bound 
DNA molecules is <p b = <pT — <Pu (where <pT is the concentration of bound and unbound DNA) 
and is hence not an independent variable. The conventional mass-action rate equations for 
the concentrations can be directly deduced by inspection of the reaction scheme Eq. (TTJ) and 
are given by 

dt<p u = h{<t>T - <Pu) - k<p u (j) + s u (4> T - <j> u ), (Bl) 
d t (f) = r u (p u + r b ((f) T - (f) u ) - k f (f) - k<f> u <f> + s u ((f) T - cf) u ). (B2) 

Note that the bimolecular rate constant k is not s b but rather is equal to Sf,Sl. This is 
since s b is a transition rate with units of inverse time and hence is in reality equal to the 
macroscopic rate constant k (with units of volume divided by time) divided by the reaction 
volume. Note also that these equations are based on the implicit assumption that the 
covariance of fluctuations in the number of molecules of any pair of species is zero, i.e. 
fluctuations are not important. 

In order to compare with the single gene results derived in the main text, we first multiply 
the above equations by the reaction volume Q and then set this volume equal to the cellular 
volume, i.e. Q<pT = 1, which leads to mean- field equations for the average molecule numbers 
of unbound DNA (n u ) and of protein (n) in a cell: 

d t (n u ) = k b (l - (n u )) - s b {n u )(n) + s u (l - (n u )), (B3) 
d t (n) = r u (n u ) + r b (l - (n u )) - k f (n) - s b (n u )(n) + s u (l - (n u )). (B4) 

These equations admit a single steady-state solution for (n u ) and (n) implying that the 
deterministic model does not predict bistability. 

Within this approach, the quantity 1 — (n u ) can be interpreted as the fraction of time that 
the promoter is bound, f g. Substituting in the latter equation, the steady-state solution 
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of Eq. (1B3j) for (n u ) and using the dimensionless rates as given by Eq. ([6]), one obtains the 
Hill equation 



<n> 

(n) 



/off = ttt^t- ( B5 ) 
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FIG. 1. Plot of / fj as a function of (n) for the case pi, = 0. The solid lines are generated by 
evaluating Eqs. (|44|) and ([35]) for varying values of p u . We also plot, for comparison, the Hill 
function (dashed lines) given by Eq. ([49 p . whose small and large (n) limits agree with those 
of the actual function (the one given by solid lines). Note that the Hill function is also the 
prediction of the deterministic model of the genetic feedback loop (Appendix B). The parameters 
are = 0.01, a u = 2 in (a) and a& = 2, a u = 0.01 in (b). In both cases 9 = 0. Note that the Hill 
function approximates well the actual function for small a^] this is generally the case for all 9. 
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FIG. 2. Dependence of the steady-state probability distribution on the non-dimensional parameters 
9 (panel a and f), Cf, (panel b), p u (panel c), a u (panel d), and (panel e). The distributions 
are obtained by numerically integrating the master equations, Eqs. (jH| and ((2]) (shown by the 
open circles), and by evaluating the analytical solutions, Eqs. ([26]) . ([271) and (f29l) (shown by the 
crosses). The perfect agreement of the two verifies that the latter equations are an exact solution 
of the master equation for the self-regulating gene. The solid lines are a guide to the eye. 
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FIG. 3. Comparison of the steady-state probability distributions as predicted by the master equa- 
tions ME(GSN) (open circles) and ME(H-ET-AL) (solid lines) . The distributions are obtained 
by numerically integrating the two master equations to obtain P(n) = a n + /3 n +i as a function of 
n, the number of free proteins. The binding and unbinding rates of the protein to the promoter 
region, h and / respectively, take progressively larger values as we go from (a) to (d), while the 
rest of the parameters are fixed. Explicitly, g a , gn, and k are, in all cases, equal to 80.0, 0.0, and 
1.0 respectively, while h = 0.001A and / = 0.1A where A takes the values 0.01 in (a), 1 in (b), 
10 in (c) and 100 in (d). Note that the ME(H-ET-AL) predicts a transition from unimodal (a) 
to bimodal (b) and back to unimodal probability distribution (c) and (d), while the ME(GSN) 
predicts a unimodal distribution in all cases. This example shows that the incorrect handling of 
bound protein degradation by the ME(H-ET-AL) leads to qualitatively incorrect features of the 
steady-state probability distribution. 
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